When two PCIe devices are connected (e.g., GPU ↔ motherboard slot), they don’t immediately start blasting data.
Instead, they go through an automatic negotiation and calibration process to:

This is done via training sequences (TS1, TS2 ordered sets) exchanged between devices.


Key Steps in Training & Initialization

  1. Detect link partner

    • Physical presence is detected.
    • Electrical idle exit confirmed.
  2. Establish Link Width

    • How many lanes are active (x1, x4, x8, …).
    • If some lanes fail, width may be reduced (e.g., device supports x16, but only x8 trains successfully).
  3. Negotiate Data Rate

    • Start at the lowest common rate (Gen1 = 2.5 GT/s).
    • Attempt to move up (Gen2, Gen3, …) if both sides support it.
    • Retrain downward if errors are excessive.
  4. Lane Reversal

    • If lanes are connected in reverse order (e.g., PCB layout swapped lane 0 with lane 7), the receiver can logically remap them.
    • This avoids the need for perfect lane routing in hardware.
  5. Polarity Inversion

    • PCIe uses differential pairs (positive + negative signals).
    • If the pair is accidentally flipped (P ↔ N), the PHY logic detects and corrects it automatically.
  6. Bit Lock (per lane)

    • Receiver extracts a clean clock signal from the incoming bit stream.
    • Ensures bits are sampled at the right time.
  7. Symbol Lock (per lane)

    • Finds alignment within the serial stream so symbols (8b/10b or 128b/130b) are properly grouped.
    • Example: “Where does the 10-bit symbol boundary start?”
  8. Lane-to-Lane Deskew

    • In multi-lane links (x4, x8, x16…), signals don’t all arrive at the same time due to PCB trace length differences.
    • Deskew aligns them so bytes across all lanes reassemble correctly.

Result

When training completes successfully: